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AMENDMENTS TO THE SPECIFICATION: 

In the latest Office Action the Examiner indicated that various of the paragraphs 
below, previously added by Applicants' previous amendment, raises the issue of new matter. 

Although Applicants do not agree, in an effort to expedite prosecution, the five 
paragraphs, previously incorporated by reference from these co-pending applications, are 
now further revised, as follows : 

The present invention includes using data stored in non-standard format, including, 
more particularly, the non-standard format described in co-pending application 10/671,888, 
referred to herein as "register block" format. Non standard format, in the context of the 
present invention as referring to the register block format, means to store the data of one or 
more of the three matrices involved in Level 3 processing to be contiguous in some 
repr e sentation (permutation) that has optimal advantag e for th e LI cach e FPU r e gist e r 
interface of a particular architecture, rather than the standard format of row major or column 
major format conventionally used to store matrix data. 

The present invention also is directed to Single Instruction, Multiple Data (SIMD) 
machines, where k > 1 indicates a number of data capable of being simultaneously moved in 
a single instruction. 

The register block data format exemplarily used in the present invention involve 
blocks of matrix data of size p-by-q where p and q are small integers so that the pieces of 
these blocks can be fitted into the registers of a particular architecture to achieve a desirable 
data format stored in these registers. The layout of these blocks is arbitrary. In usual cases, 
the p-by-q sub-blocks will be laid out either in row- or column-major fonnat. But a key idea 
is that the arbitrary layout of these blocks is tailored to the architectural design of the FPU 
and its associated floating point registers FPUs . It should be apparent that different 
architectural or instruction set scenarios would require a need to lay out the blocks 
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differently. It is intended that these different layouts, as required by unique 
archit e ctural/instruction set combinations, are considered to be in the scope of the present 
invention, since the register block format is intended to refer to the concept of dividing matrix 
data into blocks and shifting locations of these blocks to overcome "deficiencies" in 
computer architectural or instruction set design. 

All modem programming languages (C, Fortran, etc.) store matrices as in two- 
dimensional arrays. However, this layout can be proved to be one-dimensional. That is^ let 
matrix A have M rows and N columns. The standard column major format of A is as follows. 

Each of the N columns of A is stored as a contiguous vector (stride 1). Each of the M 
rows of A is stored with consecutive elements separated by LDA (Leading Dimension of A) 
storage locations (Stride LDA). Let A(0,0) be stored in memory location a. The matrix 
element A(ij) is stored in memory location a -i- i -i- LDA*j. It is important to note here that 
stride 1 is optimal for memory accesses and that stride LDA is poor. Also, almost all level 3 
linear algebra code treats rows and columns about equally. 
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